Greedy Biomarker Discovery in the Genome with Applications to Antimicrobial Resistance

نویسندگان

  • Alexandre Drouin
  • Sébastien Giguère
  • Maxime Déraspe
  • François Laviolette
  • Mario Marchand
  • Jacques Corbeil
چکیده

The Set Covering Machine (SCM) is a greedy learning algorithm that produces sparse classifiers. We extend the SCM for datasets that contain a huge number of features. The whole genetic material of living organisms is an example of such a case, where the number of feature exceeds 10. Three human pathogens were used to evaluate the performance of the SCM at predicting antimicrobial resistance. Our results show that the SCM compares favorably in terms of sparsity and accuracy against L1 and L2 regularized Support Vector Machines and CART decision trees. Moreover, the SCM was the only algorithm that could consider the full feature space. For all other algorithms, the latter had to be filtered as a preprocessing step.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proteomics Applications in Health: Biomarker and Drug Discovery and Food Industry

Advancing in genome sequencing has greatly propelled the understanding of the living world, however, it is insufficient for full description of a biological system. Focusing on, proteomics has emerged as another large-scale platform for improving the understanding of biology. Proteomic experiments can be used for different aspects of clinical and health sciences such as food technology, biomark...

متن کامل

Proteomics Applications in Health: Biomarker and Drug Discovery and Food Industry

Advancing in genome sequencing has greatly propelled the understanding of the living world, however, it is insufficient for full description of a biological system. Focusing on, proteomics has emerged as another large-scale platform for improving the understanding of biology. Proteomic experiments can be used for different aspects of clinical and health sciences such as food technology, biomark...

متن کامل

Acquired Antimicrobial Resistance Genes of Escherichia coli Obtained from Nigeria: In silico Genome Analysis

Background: Antimicrobial resistance is a global problem with enormous public health and economic impact. This study was carried out to get an overview of acquired antimicrobial resistance gene sequences in the genomes of Escherichia coli isolated from different food sources and the environment in Nigeria. Methods: To determine the acquired antimicrobial-resistant genes prevalence, genome asse...

متن کامل

Papaya Dieback in Malaysia: A StepTowards A New Insight of Disease Resistance

A recently published article describing the draft genome of Erwiniamallotivora BT-Mardi (1), the causal pathogen of papaya dieback infection in Peninsular Malaysia, hassignificant potential to overcome and reduce the effect of this vulnerable crop (2). The authors found that the draft genome sequenceis approximately 4824 kbp and the G+C content of the genomewas 52-54%, which is very similarto t...

متن کامل

نقش امیکس ها در تحقیقات بالینی- دارویی

Completion of the human genome project revealed that the molecular mechanisms of cellular responses to different situations cannot be predicted from the sequence of their genes. Hence, most biological researches are focused on the roles of proteins which are much more challenging tasks. Omics technologies, instead of analyzing individual components of an organism by conventional biochemical met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1505.06249  شماره 

صفحات  -

تاریخ انتشار 2015